Statistical Science 

2010, Vol. 25, No. 4, 517-532 

DOI: 10.1214/10-STS344 

(c) Institute of Mathematical Statistics, 2010 



Block-Conditional Missing at Random 
Models for Missing Data 

Yan Zhou, Roderick J. A. Little and John D. Kalbfleisch 



O 

Oh: 

< 



C/3 



> 
o 
o 

(N 

o 



Abstract. Two major ideas in the analysis of missing data are (a) the 
EM algorithm [Dempster, Laird and Rubin, J. Roy. Statist. Soc. Ser. 
B 39 (1977) 1-38] for maximum likelihood (ML) estimation, and (b) 
the formulation of models for the joint distribution of the data Z and 
missing data indicators M, and associated "missing at random" (MAR) 
condition under which a model for M is unnecessary [Rubin, Biometrika 
63 (1976) 581-592]. Most previous work has treated Z and M as sin- 
gle blocks, yielding selection or pattern-mixture models depending on 
how their joint distribution is factorized. This paper explores "block- 
sequential" models that interleave subsets of the variables and their 
missing data indicators, and then make parameter restrictions based on 
assumptions in each block. These include models that are not MAR. We 
examine a subclass of block-sequential models we call block-conditional 
MAR (BCMAR) models, and an associated block-monotone reduced 
likelihood strategy that typically yields consistent estimates by selec- 
tively discarding some data. Alternatively, full ML estimation can often 
be achieved via the EM algorithm. We examine in some detail BCMAR 
models for the case of two multinomially distributed categorical vari- 
ables, and a two block structure where the first block is categorical 
and the second block arises from a (possibly multivariate) exponential 
family distribution. 

Key words and phrases: Block-sequential missing data models, block- 
conditional MAR models, EM algorithm, categorical data. 
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Missing values arise in empirical studies for many 
reasons, including unavailability of the measurements, 
respondents refusing to answer certain items on a 
questionnaire, and attrition in longitudinal studies. 
Complete case (CC) analysis, which omits informa- 
tion in the cases with missing values, is inefficient 
and potentially biased, especially if the subjects in- 
cluded in the analysis are systematically different 
from those excluded in terms of one or more key vari- 
ables. Approaches that incorporate information in 
the incomplete cases include nonr espouse weighting 
(Little and Rubin, 2002, Chapter 3); multiple im- 
putation (MI), where missing values are replaced by 
multiple sets of plausible values (Rubin, 1987; Lit- 
tle and Rubin, 2002, Chapter 5); weighted estimat- 
ing equation (WEE) methods (Lipsitz, Ibrahim and 
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Zhao, 1999); and methods based on the hkehhood 
for a model for the data, such as maximum hkeh- 
hood (ML) or fully Bayes modeling. We focus here 
on the ML approach, although our models could also 
be analyzed using Bayesian or MI methods. 

Rubin's (1976) theory on modeling the missing- 
data mechanism was a key development in estima- 
tion with incomplete data. Rubin (1976) formalized 
the concept of missing-data mechanisms by treat- 
ing the missing-data indicators as random variables 
and assigning them a distribution. Specifically, let 
Z = (Zij) denote a rectangular n x p data set; the 
ith row is Zj = {Zn, . . . , Zip), where Zij is the jth 
observation for subject i. Let M = (Mij) be a miss- 
ing data indicator matrix with the ith row Mi = 
{Mil, . . . ,Mip), such that Mij is 1 if Zij is miss- 
ing and Mij is if Zij is present. We assume that 
{Zi,Mi), i = 1, . . . ,n, are independent and identi- 
cally distributed. In Rubin (1976), the joint distri- 
bution is factored as 

(1.1) f{Zi,Mi\e,i;) = f{z,\e)f{Mi\Zi,i,), 

where f{Zi\9) represents the model for the data with- 
out missing values, /(Mj|Zj,^) models the missing- 
data mechanism, and {6,ip) denotes unknown pa- 
rameters. When missingness does not depend on the 
values of the data Z, missing or observed, that is, if 

/(M,|Z„ V) = /(M,|^) for all Z„ V, 

the data are called missing completely at random 
(MC AR) . With the exception of some planned miss- 
ing-data designs, MCAR is a strong assumption, and 
missingness often depends on the observed and/or 
unobserved data. Let Zo\,s,i denote the observed com- 
ponent of Zi and ^mis.i the missing component. A 
less restrictive assumption is that missingness de- 
pends only on the observed values Zohs,i, and not 
on the missing values Zmis,i- That is, 

f{Mi\Zi,tp) = f{Mi\Zohs,i,'4') for ah Zmis,i,'(p- 

The missing-data mechanism is then called missing 
at random (MAR) . The mechanism is called missing 
not at random (MNAR) if the distribution of M 
depends on the missing values in the data matrix Z. 

The observed data consist of the values of the 
variables {Zqy,s,M) and the distribution of the ob- 
served data is obtained by integrating Z^is out of 
the joint density of .Z^ = (^obsi -^mis 

) and M. That 

is, for unit i, 



f{Zobs,i,M^\e,i,) 

(1-2) = J f{Zohs,i, Zijiis^i\6) 

The full likelihood of 6 and V is any function of 
9 and ip proportional to the product of (1.2) over 
observations i: 

n 

Lfuii (0 , I Zobs , M) a n / (Zobs,i , Mi 1 , V') . 
1=1 

The missing-data mechanism is called ignorable if 
it is MAR and if in addition, the parameter space 
for (^,"0) is a Cartesian product space x where 
G and ip Likelihood-based inferences for 6 
can then be based on 

n 

llfiZohs,i\0), 

i=l 

the ignorable likelihood of 9 based on the observed 
data Zohs (Rubin, 1976). Many methods of handling 
missing data assume missingness is MCAR or MAR. 
If this is assumed, the missing-data mechanism can 
be ignored and we only need to model the observed 
data Zobs to derive likelihood-based inferences for 9. 
However, these inferences are subject to bias when 
the data are not MAR. 

Equation (1.1) is sometimes called a selection model 
factorization of the joint distribution of [Zi,Mi) be- 
cause of connections with the econometric literature 
on selection bias (Heckman, 1976). Clearly other 
factorizations are possible. In particular, pattern- 
mixture models (Little, 1993) factor the joint dis- 
tribution as 

(1.3) f{Z„Mi\ip,T,) = f{Mi\'K)f{Z,\Mi,^), 

which models the distribution of Zi for each pattern 
of missing data. 

Both selection and pattern-mixture models treat 
the variables Zi and missing-data indicators Mj as 
single blocks. Little attention has been paid to mod- 
els that disaggregate these blocks based on subsets 
of variables and their missing-data indicators. One 
such class of models is generated by writing Zi = 
(Zj(;^-), Zj(2), • ■ • , ■Zj(_B)) where is a subset of the 
variables, with corresponding missing-data indica- 
tors Mi = (Mj(i),Mj(2), • • • >Mj(5)). For convenience, 
define the "history" up to block j for unit i as 

^ (^i(l)>Mj(i),...,Zj(j),Mj(j)) 
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and factor the joint distribution as 

f{Zi,Mi\e,i;) 

= /(Z,(i),M,(i)|0«,V^«) 

(1.4) 

■/(Z,(2),M,(2)|?^,(l),e(2)^^(2)) 

/(Z,(5),M,(5)|?^,(5_i),^(^),V^(^)). 

We call models based on the factorization (1.4) block- 
sequential missing data models. The set {Z^f^j^ , M^^j-j) 
in the jth block might be modeled using the selec- 
tion or pattern-mixture factorization, yielding com- 
binations of (1.1) and (1.3). This approach to mod- 
eling might be seen as natural when the blocks un- 
fold sequentially in time, or if they follow a causal 
sequence, and the variables in a block are condi- 
tioned on prior variables in time or in the causal 
chain. Along these lines, Robins and Gill (1997) and 
Robins (1997) argue that MAR is hard to justify 
causally when data do not have a monotone pat- 
tern, and discuss alternative factorizations that have 
a readier causal interpretation. 

Various modeling assumptions might be incorpo- 
rated in (1.4). In this article we consider a partic- 
ular form of potentially MNAR models based on 
(1.4) with specific assumptions concerning the de- 
pendence of the distribution of the variables in each 
block on the history. Specifically, we assume that in 
the jth block, the joint distribution of {Z^(^j■^, M^q-^\ 
'^^^ factorized as follows (parameters are 
left implicit): 

fiZii^j),Mi^j)\ni(j_i)) 

(1.5) 

= /(-^i(j)l^j(i-i))/(^-^i(j)l^i(j-i)>^i(j))' 

where 

fi^i{j)\^i{j~l)) = f{Zi[j)\Zi{i), . . . , Zj(j_i)), 

/(Mj(j)|'Hj(j^i),Zj(j)) = /(MjQ-)|'Hj(j_i),Zobs,j(j)), 

and Zohs,i(j) denotes the observed components of 
That is, the distribution of Zj^) given the pre- 
vious variables depends only on the previous Z's, 
not the previous M's, and the distribution of Mj^) 
can depend on previous Z^s, M's and .^obs,i(j)) but 
not on the missing components of Zj^^) , say, .^niis,i(j) ■ 
We call models of the form (1.5) block- conditional 
MAR (BCMAR), since each block would be MAR 
if values of Z in previous blocks were fully observed. 



For B = 2 blocks, (1.5) reduces to 
f{Zi,Mi\e,i;) 

= /(Z,(i)|eW)/(M,(i)|Zobs,i(i),^/;«) 

(1.6) 

■/(Z,(2)|Z,(i),e(2)) 

■ f{Mi{2) 1 ^j(l) ■, Zohs,i(2) 1 

V>(2)), 

where -Zj(i) is MAR, ignoring information about Zn2) 
and -/Vfj(2), and missingness of Zj(2) depends on the 
observed components of Zj(2), observed and unob- 
served value of -^1(1) and on Mj(x) • This mechanism 
is not in general MAR, since missingness of -^4(2) is 
allowed to depend on missing values of ■Zmis,j(i)- For 
the particular case where -Zi(i) and -^4(2) are single 
variables, this reduces to the simpler form 

f{ZuMi\e,ij) 

= /(Z,(i)|0«)/(Af,(i)|V^«) 

(1.7) 

■/(M,(2)|M,(i),Z,(i),V^(2))^ 

because of the MAR condition in each block. In this 
case, is MCAR and, given , Mj(i) , Zj(2) is 
also MAR. In Section 2 we describe inference for BC- 
MAR models based on a block-monotone reduced 
likelihood, where the conditional distribution of the 
variables in each block, given the variables in pre- 
vious blocks, is computed using only the subset of 
cases for which the variables in previous blocks are 
fully observed. This reduced likelihood is related but 
not quite the same as a partial likelihood as defined 
by Cox (1975). This reduced likelihood does not re- 
quire a model for the distribution of the missing- 
data indicators M . This is a useful property, since 
specifying models for M can be challenging, and re- 
sults are vulnerable to misspecification. The block- 
monotone reduced likelihood becomes the full like- 
lihood when data have a particular pattern, which 
we call block monotone. 

Use of the block-monotone reduced likelihood gen- 
erally involves a loss of information, and an interest- 
ing question is how much information is lost; the re- 
mainder of the paper examines this question in the 
context of simple bivariate examples. We analyze 
in detail the model (1.7) for case of bivariate cat- 
egorical Z, where the complete cases form a 2- way 
contingency table, and the incomplete cases form 
supplemental margins (see, for example. Little and 
Rubin, 2002, Chapter 13). In addition, we give a less 
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detailed analysis of a more general example with two 
blocks where the distribution of -^1(2) is from the ex- 
ponential family. 

The EM algorithm (Dempster, Laird and Rubin, 
1977), a ubiquitous algorithm for ML estimation 
from incomplete data and the topic of this special 
issue, plays a useful role in fitting these models. EM 
is particularly appealing for categorical data, since 
the Poisson and multinomial distributions for mod- 
eling count data yield complete data loglikelihoods 
that are linear in the cell counts. Consequently, the 
E step of EM consists of replacing the complete-data 
cell counts by conditional expectations given the ob- 
served data, in effect distributing the supplemental 
margins into the full table according to current es- 
timates of the cell probabilities. The M step of EM 
is the same as complete-data ML estimation based 
on the data filled in by the E step. This approach to 
estimation for count data with some grouped counts 
was first established as ML by Hartley (1958). The 
application to a (2 x 2) table with supplemental mar- 
gins was considered by Chen and Fienberg (1974), 
and extended to the general class of loglinear models 
by Fuchs (1982). 

For some hierarchical loglinear models the M step 
of EM requires iteration, so EM involves double it- 
eration. The usual approach is the Deming-Stephan 
algorithm, also known as iterative proportional fit- 
ting (Bishop, Fienberg and Holland, 1975). If the 
M step is restricted to just one iteration of Deming- 
Stephan, the result is an example of an ECM 
(Expectation Conditional Maximization) algorithm, 
which achieves similar theoretical properties to EM 
with just a single iterative loop (Meng and Rubin, 
1993; Little and Rubin, 2002). EM is also useful for 
fitting MNAR models for contingency tables (Baker 
and Laird, 1985; Fay, 1986; Rubin, Stern and Ve- 
hovar, 1995; Little and Rubin 2002, Section 15.7). 
As shown below, EM also plays a useful role for 
BCMAR models. 

In Section 3, we consider ML estimation for a 
BCMAR model for bivariate categorical data, where 
Z= (Z(;^), Z(2)) are assumed to have a multinomial 
distribution. The results are surprising. The block- 
monotone reduced ML estimates of the parameters 
of the joint distribution of (Zi^x)^'^{2)) (^s discussed 
in Section 2) are computed noniteratively from the 
monotone pattern, excluding the data with Z(2) ob- 
served and Z{x) missing. These are in fact the full 
ML estimates, providing corresponding estimates of 
the parameters of the missing-data mechanism all lie 



in the admissible range [0, 1]. If not, then the data 
with Z(2) observed and Z^y^ missing enter into the 
full ML estimates, and an iterative algorithm such 
as EM is needed to compute them. In Section 4, a re- 
stricted version of the BCMAR model is introduced 
where missingness of Z(2) depends on the perhaps 
unobserved value of Z{x) but not on whether Z^x) 
is missing. Some numerical examples are presented 
in Section 5 to compare unrestricted and restricted 
BCMAR models and MAR models and to illustrate 
when the block-monotone reduced ML estimates in 
the BCMAR models are full ML. A real data ex- 
ample is given in Section 6. Section 7 explores a 
more general example of a BCMAR model with two 
blocks, in which the possibly vector valued variable 
Z(2) arises from a distribution in the exponential 
family. Section 8 reviews the ideas of the article and 
outlines extensions to other missing-data problems. 

2. ESTIMATION OF BLOCK-CONDITIONAL 
MAR MODELS USING A REDUCED 
LIKELIHOOD 

For any BCMAR model, define the block-monotone 
reduced likelihood to be 

B 

(2.1) =n n /(-^obs,t(j)i-^i(i)i ^j(2)i • • • 5 

J=liGQj 

where Qj is the subset of cases with , Zj(2) , . . . , 
fully observed, that is, Mj(i) = Mj(2) = • • • = 
Mj(j„i) = 0. Under usual regularity conditions, the 
estimator of 6 that maximizes -Lbm(^) has the same 
properties as maximum likelihood, in that it is con- 
sistent and asymptotically normal with an asymp- 
totic covariance matrix estimated by I{6)~^ where 
1(0) = - log Li,^{e)/ 89^ 89. These results can be 
obtained using conditional arguments similar to those 
of Cox (1975) in his examination of partial likeli- 
hood. 

We prove this property for the special case of 
B = 2 blocks; the extension to more than two blocks 
is straightforward. The observed-data likelihood for 
the two blocks can be written 

Lohs{9,ip) 

n 

=n{/(^obs,i(i),Mi(i)i0,v) 

i=l 
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■ [/(-^obs,i(2))-^j(2)|-^obs,j(l)) 

(2.2) 

• [/(■^obs,j(2)i-^j(2)|-^obs,j(l)) 

M,(i),0,V)]'-'^}, 

where 6i = /(Mj(i) = 0). Note that the second term 
in the product refers to the cases for which i € Q2. 
Consider the pseudo-hkeUhood generated by the first 
two terms in the product (2.2). Let 7 = (0,'i/')) and 
denote the corresponding scores as 

d 

Si{l) = log /(^obs,i(l) , ^i(l) 16*, ■0) 

and 

• log /(Zobs,j(2) 1 Mi(2) I ^obs,j(l) ) 

Mi(i)=0,e,V'). 

Under usual regularity conditions for the appropri- 
ate conditional densities, it is now easily seen that 
E[S,(j)] = and E[S\..^] = -E[dSi^^.j^/d^] where j = 
1,2. Finally, by conditioning on ^obs,i(i) , ^i(i) , it 
can be seen that £'[5j(i)S'j(2)] = so that the scores 
are uncorrelated. It follows that 

n 

(2.3) ^[5,(i)(^,V) + 5,(2)(0,V')]=O 

i=l 

is an unbiased estimating equation with asymptotic 
properties similar to those of a hkelihood score equa- 
tion. Under i.i.d. assumptions for the data {{Z^^i-^, 
Mj(]^), Zj(2), Mj(2)), « = 1, • • • 1 ''^l, the central limit the- 
orem applies to the total score and a Taylor expan- 
sion gives the usual asymptotic normal results for 
the estimators 6, if) that arise as a solution to (2.3). 
Further, the asymptotic variance of O^il^ can be esti- 
mated as the inverse of the usual observed informa- 
tion. Finally, we note that 

n 

=n^(^obs,^(i)i^^'^)/(M,(i)iz,bs,.(i),V'(')) 

i=l 

■ W /(^obs,i(2)l^i{l)5^^^^) 

■ /(M,(2) M,(i) = 0, Z„bs,i(2), ^^'^) 

• /(^obs,i(2); Afj(2)|^obs,i(l); Afj(i),6','(/'), 

HQ2 



where the factorization of the first two products into 
distinct components for 9 and is a result of the 
BCMAR assumptions. Rearranging terms, we can 
write 

-^^obs(6',V') = ^^11(6*) X Lu{ip) X LrcstiO,ip), 
where 

n 

ibm(^)=n/(^obs,«(l)l^^'^), 
i=l 

i(£Q2 
n 

^M(V) = n^(^^a)l^°bs,i(i),V'^'^) 

1=1 

• J] /(M,(2)|Z,(i),Mi(i) = 0, 

^obs,i(2),V'^^^), 

Lrcst{0,1p)= /(^obs,j(2))^j(2)l^obs,j(l)) 
i^Q2 

M,(i),0,V). 

It can then be easily seen that the observed infor- 
mation matrix based on the first two components 
is diagonal in the parameters, and the asymptotic 
results for 9 can be determined from Lbm(^) as de- 
scribed above. 

The block-monotone reduced likelihood inference 
drops the components Lm('0) and L^cs\.{0 li^) from 
the likelihood, and bases inference about 6 on the 
remaining term Lbm(^)- This provides a convenient 
approach to inference, since the block-monotone re- 
duced likelihood does not involve the distributions of 
the missing-data indicators, and, hence, these distri- 
butions do not need to be specified. Correctly spec- 
ifying these distributions is not easy, and estimates 
of are vulnerable to their misspecification. 

We say that = (Zj(i), ^^(2), . . . , Zj(B)) have a 
block monotone pattern if, for all j, is fully 

observed whenever -Zi(j) has at least one observed 
component. Note that block monotonicity is weaker 
than a monotone pattern for all the variables, since 
the variables within each block do not necessarily 
have a monotone pattern. If the data have a block 
monotone pattern, the term Lrest(^)V') is no longer 
present, and the block-monotone reduced likelihood 
is equivalent to the full likelihood for inference about 
9, providing the parameters 6 and tp are distinct. In 
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other situations, dropping the term Lj.cst{G,ip) in- 
volves a loss of information, so the estimates are 
not in general fully efficient compared with full ML. 
We explore this potential loss in efficiency for some 
simple models in the remainder of this article. 

3. UNRESTRICTED BCMAR MODELS FOR 
BIVARIATE CATEGORICAL DATA 

We consider data with B = 2, Z = (.Z^(i) , ^(2) ) whe- 
re and Z(2) are categorical variables with J and 
K categories respectively. Both and Z(2) may 
be missing, so there are four missing-data patterns. 
Let r = 0, 1, 2, 3 index the missing-data patterns and 
let Pr denote the set of sample cases with pattern 
type r, r = 0, . . . , 3 (see Table 1). Let rir denote the 
number of cases in the sample with pattern r and 
n = rir denote the total sample size. 

For categorical and ^(2) with J and K levels, 
data in Pq can be arranged as a J x if contingency 
table, and the data in Pi and P2 form supplemental 
J X 1 and 1 X K margins. Let f^(o),jfc be the count 
of complete cases with = j,Z^2) = "-(1)^+ be 
the count of cases with = j and Z(2) missing. 



?i(2),+fc be the count of cases with Z(2) 



k and Z, 



missing, and 7i 



(1) 



(3), 



be the count of cases with both 



Zf^i^ and Z(^2) niissing. The data are displayed in 
Table 2. Note that hq = '^j=iYlk=i''^{o),jkj ^1 = 
E/=i'^(i)j+> ^2 = Ef=i^{2),+fc, and ng = n(3) ++. 
The parameters of interest are 6 = {9jk}, where 

djk = P{Z(i) = j, Z(2) = k) with ^.^=1 Ef=i ^jk = 1- 
The MAR assumption for these data implies that 

P(M(i) = M(2) = l|^(i) = J, ^(2) = k) = v, 

P{M(i) = 0, M(2) = l|^(i) = J, ^(2) = k) = vf\ 

P{M^i) = 1, M(2) = 0|Z(i) = J, Z(2) = k) = 
P(M(i) = M(2) = 0|Z(i) = j, Z(2) = k) 



1 — V — Vj 



(0) (1) 



k ' 



where l<j<J,l<k<K and M(i) and M(2) are 
missing-data indicators for Z'(i) and Z(2) with 1 and 

Table 1 

Missing-data pattern for two variables 
Pattern 
Po 
Pi 
P2 
P3 



■ 

? 


H 


? 


D 



Table 2 

Notation for a J x K table with supplemental margins for 
both variables 





1 


2 


K 


Missing 


1 




'^(0),12 •■■ 


'n{o),iK 




2 


W(0),21 


'^(0),22 


'n{0),2K 


"•(1),2+ 


2(1) 










J 


"(0),J1 


^(0),J2 ■■■ 


■■■ "(0),JK 


"-(1),^+ 


Missing 


"(2), + l 


^{2),+2 ■■■ 


••• "(2), + if 


"-(3),++ 



denoting missing and observed values respectively 
(see Little and Rubin, 2002, Example 1.19). In this 

case, C, = {v^v^* ,v^^^} represent nuisance parame- 
ters for the missing-data mechanism. Under MAR, 
the likelihood factors into distinct components of 9 
and ML estimation of 9 under MAR involves all 
the observed data and typically requires an itera- 
tive algorithm such as EM (Little and Rubin, 2002, 
Chapter 13). 

We consider as an alternative to MAR the fol- 
lowing BCMAR model (1.7), which incorporates the 
assumption that is MCAR and missingness of 
Z(2) depends on and Mf^iy. 

P(M(i) = l|Z(i)=i,Z(2) = fc) = (/., 

P(M(2) = l|M(i) = 0, = j, Z(2) = k) 



(3.1) 



P(M(2) 



11 A/, 



(1) 



l.Z, 



(1) 



j, Z(2) = k) 



4". 



where 1 < j < J, 1 < A; < K. Here $ = (t)f^ , 



} 



are nuisance parameters corresponding to the missing- 
data mechanism. The number of parameters in this 
model is JK + 2 J, whereas the degrees of freedom 
of the data are JK + J -\- which comprise JK 
for the complete cases, plus J for the supplemental 
margin on -^(1), plus K for the supplemental mar- 
gin on Z(2) , plus 1 for the number of cases with 
and Z(2) both missing, minus 1 for the total which 
is considered fixed at n. When J = K, the model 
has the same number of parameters as degrees of 
freedom in the data; otherwise, the model has more 
parameters for J > K or fewer for J < K. 



Note that if 



does not depend on 



this reduces to a restricted MAR model in which 
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Z(^i-j is MCAR and missingness of Z(2) depends on 
M(^i^ , and only depends on for the pattern with 
Z(^i-j observed. A hkehhood ratio test could be used 
to test this restricted MAR assumption against the 
more general BCMAR model and the EM algorithm 
can be applied to compute the ML estimates (Lit- 
tle and Rubin, 2002, Chapter 13). This restricted 
MAR model is introduced testable submodel 
of the unrestricted BCMAR model, but we do not 
view it as particularly appealing substantively, since 
if missingness of Z(2) depends on for the cases 
with Z(i') observed, one might also expect it to de- 
pend on Z'(i) for the cases with missing. An- 
other submodel of the unrestricted BCMAR model 
is discussed in Section 4. 

3.1 EM Algorithm 

The full likelihood for the above model is 

L(6',<I)|Zobs,(l), ■^obs,{2),^) 

= nP(^.(l),^.(2)l^)(l-</') 

•p(Mi(2)=0|Zi(i),Mi(i)=0,$) 

■IIp{Z,^,)\9){1-<P) 

iePi 

(3.2) •p(Af,(2) = l|Z,(i),M,(i)=0,$) 

•p(M,(2)=0|Z,(i),Mi(i) = l,$) 

• n 5^p(%)i^)<^ 

*GP3 ^i(i) 

•p(M,(2) = l|Z,(i),M,(i) = l,$). 
The block-monotone reduced likelihood is 



denote the parameter estimates at iteration 



(3.3) 



-^bm(^|-^obs,(l)) ■^obs,(2)) 



which does not model the missing data mechanism, 
and drops the data for patterns P2 and P3. We first 
consider ML estimation for the full likelihood (3.2), 
and then discuss the relationship between these ML 
estimates and the estimates that maximize the block- 
monotone reduced likelihood (3.3). 
One approach to ML estimation is to apply the 



t, and '^(r) jfc '^'^ estimate of cell frequency for 
-^i(i) = J)-^i(2) = A; in pattern P^. The E step dis- 
tributes the partially classified observations into the 
table according to the corresponding probabilities: 

(t) _ jk 
^3 + 



n 



(t) _ 

(2) jfc - ^(2),+fc 



The M step calculates new parameters as follows: 

(t+l) _ ^(0),ifc + ^(l),jk + "(2),jfc + ^{3),jk 

, _ Ya=i = 1) _ 712 -h na 



n 



n 



(0) ^ Er=i-^(^.(i) =o.^i(2) =1,%) =3) 
' Er=iW(i)=o,^.(i)=i) 



n(i) j+ + n(o) jH 



(3),ifc 



EM algorithm. To define the E step of EM, let 



Z^fc '\2),jk ^ l^k '''{3),jk 

The E step and M step alternate until the parameter 
estimates converge. 

Note that (j) and ^.re estimated directly and 

are unchanged throughout the EM algorithm. Com- 
plete-case estimates or estimates arising from the 
monotone pattern Pq and Pi can be chosen as the 
starting values of {Ojk}, and the estimates of {(pf'^} 
or any constant in (0,1) can be taken as initial values 

of When J > K, the model has more param- 

eters than degrees of the freedom. In this case, mul- 
tiple maxima may exist, and depending on starting 
values, the EM algorithm can converge to different 
estimates. This case will be discussed further below. 

3.2 Noniterative ML Estimates 

When J > K, noniterative estimates of the pa- 
rameters can sometimes be obtained using the fac- 
tored likelihood method (Little and Rubin, 2002, 
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Chapter 7). We transform the parameters {9jk,4>, 

"(0) jfc = ^(-^(1) = Z{2) = fc|M(i) = M(2) = 0), 

/3(i),,-+ = P(Z(i) = j|M(i) = 0,M(2) = 1), 

7(2),+fc = PiZi2) = k\M{i) = 1,^%) = 0), 
(3.4) 7ro = P(M(i)=0,M(2)=0), 

7ri = P(M(i)=0,M(2) = l), 
7r2 = P(M(i)=l,M(2)=0), 
7r3 = P(M(i) = l,M(2) = l), 

where 1<J<J, l<fc<i^ and the foHowing con- 
straints apply: 

J K J 

j=l k=l j=l 

K 3 



k=l 



r=0 



These parameters correspond to a pattern-mixture 
factorization, as in (1.3). The components of (0, <^) = 
{9jk,4',(l>j'\<l)'p) can be expressed in terms of the 
new parametrization (3.4) as follows: 



(0) jfc A /^ ^o"(o),j+ + ^1/^(1), j+ 
I M TTo +7ri 



(3.5) 



a 

-"(o),j 

= 1 - TTo - TTi, 

- (0) _ 



and J = 1, . . . , J} is a solution to the K simul- 
taneous equations 

J 

J^(l - <a5'V,;i- = P{Mi^2) = 0, ^(2) = = 1) 

vr2 



where a 



(o),i- 



1 - TTo - TTi 

Z^fc=i«(o),ifc- 



7(2), 



Letting (v','?'") represent the parameters in (3.4), 
the likelihood can be written as 

L((^,7r|Zobs,(l),-Z'obs,(2), Af) 
n 

= J]p(M,(i),Mi(2)) 

i=l 



W p(Z,(i),Zi(2)|Mi(i) = 0,M,(2) = 0) 
iepo 

J]p(Z,(i)|Mi(i)=0,Mi(2) = l) 



JJp(Zi(2)|Mj(i) 



l,^/i(2)=0) 



3 J.ii" 



J 



K 



n^n 



(0),ifc 

r-=0 j,k=l 3=1 k=l 

Maximizing the four terms in this likelihood yields 



2). + k 
+k ■ 



a 



7(2), +fc 



no 

ra(2),+fc 

n2 



/3(i)„ 



ni 



VTr- 



n 



where 1 < j < J,l < k < K and < r < 3. Estimates 
of 9jk, 4> and (p^^ can then be obtained by substitut- 
ing the above estimates of {ip,Tr) = (a(o),jfc) 
7(2),+fc)'^r) into equation (3.5). This yields 



(3.6) 



n 



n 



(0),j+ + 



(3.7) <A 



(0) 

j 



^(0),iH 
1 - TTo - TTi , 

7ri/3(i)jH 



no + ni 



^oa(o),j+ + ^i/3(i),jH 



Estimates of {i;^>^- , j = 1, . . . , J} can be obtained as 
solutions of the following K simultaneous equations, 
provided they are in the parameter space: 



(3.8) Y.(i-^f)e,, 



7r2 



1 - TTo - VTi 



7{2),+fc- 



This approach yields ML estimates, providing the 
estimates lie within the parameter space, that is, 
the probabilities lie between zero and one. The ex- 



pressions for 9jk,4' ^nd 



always yield estimates 



in [0,1]. The equations in (3.8), however, may or 

may not yield solutions for {(f'f^^} that lie in [0, 1]. If 
they do, then estimates from this approach are ML 



estimates and the ML estimates of 9i 



and 



^(0) 



are unique. If not, this approach fails to yield ML 
estimates of the parameters of interest. In this case, 
however, the EM algorithm can still be used, and 
whether the ML estimate is unique or not depends 
on the form of the likelihood. If the likelihood is 
unimodel, the ML estimate is unique. The solution 
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set for (3.8) depends on whether J = K ox J > K. 
When J = K there are J equations for J unknowns. 
Provided the J x J matrix, Q = (Ojk), is nonsingu- 
lar, these equations yield a unique solution that may 
or may not lie in the parameter space. When J >K 
and © has rank K' < J, the solution set is a linear 
subspace of dimension J — K' . If the solution space 
intersects the parameter space [0, 1]'^, then this ap- 
proach yields the ML estimates. For example, con- 
sider the case where J = 3, K = 2 and is of full 
rank K, the solution set to (3.8) is a straight line. 
When it intersects the unit cube representing the 
parameter space, this approach yields unique ML 
estimates of Oji^,(j) and (p^j^\ but any point in [0,1]"^ 
that is in the solution set of (3.8) is a ML estimate 
for However, when the solution set does not 

intersect the unit cube, this method fails to yield 
the ML estimates of the parameters. The EM al- 
gorithm can be implemented to find ML estimates, 
which may or may not be unique. When J < K, 
noniterative ML estimates do not exist and the EM 
algorithm can be applied to compute ML estimates. 

The closed-form estimates (3.6) of 9 are simply the 
product of the estimated conditional probabilities 
of Z(2) = k given Z(^i^ = j from the complete cases 
and the marginal probabilities of Z^^) = j from the 
cases with observed. These estimates maximize 
the block-monotone reduced likelihood discussed in 
Section 2, which drops the data for Z(^2) from the 
pattern P2 with Z(2) observed and missing. One 
would expect the data in P2 to provide additional in- 
formation for the marginal distribution of -2^(2)1 but 
this is only the case if the data in P2 are inconsis- 
tent with the data on Z(2) from Pq and Pi, in the 

sense of yielding estimates of from (3.8) that 

lie outside the interval [0, 1]. 

4. A RESTRICTED BCMAR MODEL 

In the unrestricted BCMAR model (3.1), the miss- 
ingness of Z(2) is allowed to depend not only on 
the (perhaps unobserved) value of but also on 
whether Z^^) is missing or not. If, given the value 
of Z(i), the probability of Z(2) being missing is as- 
sumed the same for the cases with Z(i) observed 
and missing, we then have the restricted BCMAR 
model: 



(4.1) 



P(M(i) = l|Z(i)=j,Z(2) = A:) = </>, 
P(M(2) = l|M(i) = /, = j, Z(2) = k) = (Pj, 



where I = 1,2 and 1 < j < J,l < k < K . The number 
of the parameters in this model is JK + J which is 
always less than the degree of freedom JK -\- J + K 
in the data. The explicit estimates in (3.6) are no 
longer ML estimates of {Ojk}, and EM is needed 
to obtain ML estimates of the parameters. In the 
E step, the partially classified observations are ef- 
fectively distributed into the table according to the 
corresponding estimated probabilities: 

nit) 



(t) _ 

^(3),,fc = ™(3),++-^ 



In the M step, new estimates are calculated as 

^J^^ 

{t+1) "-(0).jfc + + "(2),jfc + ^(3),jfc 

j/c n 
n2 + ns 



n 



,.(m) 



(3)jfc 



n 



(o),j+ + Efc'^(i^)jfc + S 



k'^{2),jk 



it) 



+ Ekn 



it) ■ 

(3)jfc 



The E step and M step alternate until the parame- 
ter estimates converge. Since (j) is estimable directly 
and is unchanged throughout the EM algorithm, 
starting values are only needed for {Ojk} and {4>j}. 
Complete-case estimates or pooled estimates from 
the monotone pattern Pq and Pi can be used as 
starting values of {Ojk}- Estimates of {ipf^} ™ (3-7) 
or any constant in (0, 1) can be taken as initial val- 
ues of {(pj}. 

The restricted BCMAR model (4.1) is a submodel 
of the unrestricted BCMAR model (3.1) obtained by 

assuming (p^J'^ = (f)^p . The restricted model is plausi- 
ble when the mechanism of missingness of Z(i) is rel- 
atively unrelated to the mechanism of missingness of 
Z(2) , SO the probability that one variable is missing is 
not thought to be related to whether the other vari- 
able is missing. The appeal of the restricted model 
is that it is more parsimonious and will tend to yield 
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Table 3 

2x2 tables with supplemental margins for both variables 
3A 3B 







Z(2) 








^(2) 




1 


2 


Missing 


1 


2 


Missing 


1 


50 


150 


30 


1 


100 


50 


30 


2 


75 


75 


60 


2 


75 


75 


60 


Missing 


28 


60 


50 


Missing 


28 


60 


50 



more efficient estimates of the parameters of inter- 
est. A likelihood ratio test can be applied to test 
the restricted BCMAR assumption against the more 
general unrestricted BCMAR model, and one may 
favor the restricted BCMAR if this test is not re- 
jected. 

5. NUMERICAL EXAMPLES 

5.1 Examples with J = K = 2 

For data given in the 2x2 Table 3A with supple- 
mental margins, the noniterative estimates of {Oj^} 
that drop the data in P2 are ML estimates under the 
unrestricted BCMAR model. The estimates of {Ojk} 
are also close to those in the restricted BCMAR and 
MAR models which involve all the data (Table 4). 
However, for data in Table 3B, the marginal distri- 
bution of Z(2) in P2 is substantially different from 
that in the monotone pattern Pq and Pi. In this 
case, the unrestricted BCMAR model yields the es- 
timates of {f/'j^^} from (3.8) that do not lie between 
and 1. The EM algorithm applied to all the data 
is needed to obtain the ML estimates, and the es- 
timates of {9jk} are different from those in the re- 
stricted BCMAR and MAR models (Table 5). 

5.2 Examples with J = S,K = 2 

Table 6 A and B give data for the case J = 3, K = 2 
for which the solution set to (3.8) is a straight line. 

The parameter space for {</'j-^'' } is a unit cube, as dis- 
played in Figures 1 and 2. For the data in Table 6A, 
the solution line does not intersect the cube (Figure 

1) , so ML estimates in the unrestricted BCMAR 
model are obtained iteratively using all the data 
(Table 7). For the data in Table 6B, the marginal 
distribution of Z(^2) P2 is similar to that in Pq and 
Pi and the solution line intersects the cube (Figure 

2) , and the noniterative estimates obtained by drop- 
ping the data in P2, displayed in Table 8, are the ML 



estimates of {9jk}, although there are multiple ML 

estimates for {(f)"^^. ML estimates in the restricted 
BCMAR and MAR models are unique for both data 
sets in Table 6. 

6. MUSCATINE CORONARY RISK FACTOR 
STUDY 

The Muscatine Coronary Risk Factor Study (MCRF) 
is a longitudinal study of obesity in 4856 school chil- 
dren. Five cohorts (ages 5-7, 7-9, 9-11, 11-13, 13- 
15) of boys and girls were measured for height and 
weight in 1977, 1979 and 1981. Children with rela- 
tive weight greater than 110 percent of the median 
weight for their age-gender-height group were clas- 
sified as obese, and at any time point about 20 per- 
cent of the children were obese. We are interested 
in estimating obesity rates over time and evaluat- 
ing whether or not these rates differ by gender. The 
study was first presented by Woolson and Clarke 
(1984), and further analyses can be found in, for ex- 
ample, Baker (1995), Ekholm and Skinner (1998), 




t 



Fig. 1. Noniterative estimates of (fi^p for data in Table 6A. 



BLOCK-CONDITIONAL MAR MODELS FOR MISSING DATA 



11 



Table 4 

Estimates of parameters for data in Table 3 A 

Parameter of interest Nuisance parameter 

6*11 012 6>21 6>22 <t> 4>[°^ 4>i°^ <Ai" Cf>i^^ 

Unrestricted BCMAR 
noniterative estimate 
EM algoritlim 

Restricted BCMAR 

EM algoritlim 
Restricted MAR 

EM algorithm 



0.131 
0.131 



0.392 
0.392 



0.239 
0.239 



0.239 0.239 
0.239 0.239 



0.126 0.390 0.238 0.246 0.239 



0.127 0.398 0.232 0.243 0.239 



0.130 0.286 0.113 0.636 
0.130 0.286 0.113 0.636 



0.157 



0.130 



4°) 
0.286 



1,2 

(t>2 

0.333 
0.362 



Lipsitz, Parzen and Molenberghs (1998) and Birm- 
ingham and Fitzmaurice (2002). 

The analysis is compHcated by the study design. 
Both cross-sectional and longitudinal information 
about age trends in obesity rates were present in the 
data. Due to cohort effects, cross-sectional age trends 



in obesity rates may be different from longitudi- 
nal trends. Ekholm and Skinner (1998) found no 
statistical evidence of cohort effects. Therefore, in 
our analyses, cohort effects are assumed negligible 
and data are pooled across five age-group cohorts. 
In order to simplify the illustration, we only use the 



Table 5 

Estimates of parameters for data in Table 3B 



Parameters of interest Nuisance parameter 

6*11 6ll2 021 022 (i> </'^'" (/-i"' (/>^'^ 



Unrestricted BCMAR 
noniterative estimate 
EM algorithm 


0.308 
0.297 


0.154 
0.153 


0.269 
0.236 


0.269 
0.314 


0.261 
0.261 


0.167 0.286 
0.167 0.286 


2.507 -1.476 
0.867 


Restricted BCMAR 














.^-. = 1,2 


EM algorithm 


0.274 


0.175 


0.242 


0.309 


0.261 


0.197 


0.320 


Restricted MAR 












0.167 0.286 




EM algorithm 


0.279 


0.174 


0.239 


0.308 


0.261 


0.362 



Table 6 

3x2 tables with supplemental margins for both variables 
6A 6B 







Z(2) 








Z(2) 




1 


2 


Missing 


1 


2 


Missing 


1 


100 


50 


30 


1 


50 


150 


30 


Z(i) 2 


75 


75 


60 Z(i) 


2 


75 


75 


60 


3 


32 


67 


20 


3 


32 


67 


20 


Missing 


28 


60 


50 


Missing 


28 


60 


50 
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Table 7 

Estimates of parameters for data in Table 6 A 



Parameter of interest Nuisance parameter 





011 


012 


021 


022 


031 


032 


</> 


4>\ 








Unrestricted BCMAR 
























Noniterative estimate 


0.236 


0.118 


0.206 


0.206 


0.076 


0.158 


0.213 


0.167 


0.286 


0.168 no solution in [0,1]^ 


EM algorithm 


0.235 


0.117 


0.192 


0.219 


0.071 


0.166 


0.213 


0.167 


0.286 


0.168 1 


0.037 


Restricted BCMAR 


















^ = 


0f\j = l,2,3 




EM algoritlim 


0.218 


0.126 


0.194 


0.224 


0.069 


0.168 


<i> 

0.213 


0.196 


02 

0.322 


03 

0.190 


Restricted MAR 














































0(1) 


























EM algorithm 


0.221 


0.127 


0.190 


0.223 


0.070 


0.169 


0.213 


0.167 


0.286 


0.168 


0.362 



data from the surveys of years 1977 and 1981 (Ta- 
ble 9). 

The analysis is further complicated by the sub- 
stantial nonresponse. Only 40 percent of children 
provided complete records in 1977 and 1981. In addi- 
tion to the complete records, there are three nonre- 
sponse patterns, specifically, two patterns with one 
missing response and one pattern with two missing 
responses. Baker (1995) reported two main reasons 
for nonresponse: (1) no parental consent form was 
received and (2) the child was not in school on the 
examination day. For girls, the missingness of obese 
status in 1981 is found to depend on the missingness 
in 1977 using a chi-square test (p- value < 0.0001). 
Furthermore, girls measured and classified as obese 
in 1977 were more likely to have missing data in 




Fig. 2. Noniterative estimates of (ji'p for data in Table 6B. 



1981 than those classified as nonobese (p- value < 
0.0001 based on a chi-square test). The estimates 
of girls' obesity rates and missing probabilities in 
the BCMAR model discussed above are presented in 
Table 10. For the unrestricted BCMAR model, the 

estimate from (3.8) of {^^^4^^} is (0.274,0.121), 
which is in the parameter space, so closed form esti- 
mates of the parameters are available. A bootstrap 
approach was used to estimate standard errors. If 
a bootstrap sample leads to the solutions of {<Aj^^} 
from (3.8) that lie outside of the parameter space, 
the EM algorithm is used to obtain the ML esti- 
mates. Among the 1000 bootstrap samples, 23.2% of 
the samples yield the solutions of {<t>f^} from (3.8) 
that are outside of the parameter space. 

Likelihood ratio tests can be utilized to test the 
two submodels discussed above against the more 
general unrestricted BCMAR model. Denote the un- 
restricted BCMAR model as Ml, the restricted BC- 
MAR model as M2 and the restricted MAR model 
in Section 3 as M3, and let ?max represent the maxi- 
mized value of the loglikelihood. We find that 
-2(/„,ax(M2) - /max(Ml)) = -2(-4569.823 + 
4535.292) = 69.062, which yields a p-value < 0.0001 
when compared to xi • There is strong evidence that 
the restricted BCMAR model does not fit the data. 
On the other hand, ^inax(-^3) is close to lms.x{Ml), 
and we cannot differentiate the restricted MAR model 
from the unrestricted BCMAR model. 

Similarly for the boys, the estimate from (3.8) 

of {4>"^\ (j)^'^ } in the unrestricted BCMAR model is 
(0.228,0.325), which is in the parameter space, and 
closed form estimates of the parameters are avail- 
able. Among 1000 bootstrap samples, only 28 sam- 
ples yield the solutions of {(l^^p) from (3.8) outside 
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Table 8 

Estimates of parameters for data in Table 6B 
Parameter of interest Nuisance parameter 



6*11 012 6>21 6>22 6>31 6»32 <t> 4>f^ 4"' '/'i'' </'i'^ 

Unrestricted BCMAR 

Noniterative estimate 0.103 0.309 0.188 0.188 0.069 0.144 0.198 0.130 0.286 0.168 multiple solutions m[Q, if 

EM algorithm 0.103 0.309 0.188 0.188 0.069 0.144 0.198 0.130 0.286 0.168 multiple solutions 

Restricted BCMAR 0^'" = 0^.^' , j = 1, 2, 3 

EM algorithm 0.100 0.307 0.189 0.193 0.067 0.144 0.198 0.154 0.328 0.197 

Restricted MAR 0^^' = 0^^^ = 0^^' 

0^ 0^°' 0^^ 0<^^ 

EM algorithm 0.101 0.311 0.184 0.190 0.068 0.146 0.198 0.130 0.286 0.168 0.362 



Table 9 
Tables of data from muscatine 
coronary risk factor study 









1981 








1 


2 


Missing 


Girls 












1 


701 


98 


497 


1977 


2 


59 


111 


183 




Missing 


408 


139 


174 


Boys 












1 


699 


98 


566 


1977 


2 


72 


116 


141 




Missing 


473 


125 


196 



Notes: 1 — not obese, 2 = obese. 



of the parameter space. The hkehhood ratio test 
yields strong evidence against the restricted BC- 
MAR model, with -2(Zmax(M2) - /max(Ml)) = 
-2(-4748.48 + 4713.03) = 70.9 on two degrees of 
freedom. On the other hand, lma.x{M3) is close to 
^max(-^l), and the restricted MAR model seems to 
be satisfactory (Table 11). 

The models considered above show a small effect 
on the fitted values of obesity rates and their stan- 
dard errors. For boys, the marginal distributions of 
1981 obesity rates are quite similar for those with 
1977 obesity rates observed or not. If we consider 
only the cases with 1977 obesity rates observed, the 
noniterative block-monotone reduced ML estimates 
of obesity rates for the unrestricted BCMAR model 
are ML estimates, and these are close to ML esti- 
mates in the restricted BCMAR and MAR models. 
Furthermore, i^^*^^ and 1^2*^^ are close to one another. 



which suggests a MCAR mechanism. As a conse- 
quence, complete-case estimates of obesity rates are 
also similar to those in three models considered above. 
For girls, for the same reason, noniterative block- 
monotone reduced ML estimates of obesity rates for 
the unrestricted BCMAR model are ML estimates 
and are close to those in the restricted BCMAR and 
MAR models. However, (j)^^ and ^'2''^ are quite dif- 
ferent, and, as a consequence, complete-case esti- 
mates of obesity rates are not similar to those in 
the other three models. 

7. TWO BLOCK BCMAR DATA WITH 
OUTCOMES FROM THE EXPONENTIAL 
FAMILY DISTRIBUTION 

Suppose, as before, that takes values 1, . . . , J 

with probabilities 9j^^ where X^^j^^ = 1- The model 
in Section 3 is generalized here to allow Z^2) to have 
an exponential family distribution of full rank. Thus, 
we suppose that the density of Z(2) given Z'(i) is 

/(Z(2)|Z(i)=j,e(2)) 

= a(Z(2))exp[c(ef ) + t(Z(2))^^f], 
(2) 

where j = 1, . . . , J, 6 - and t(Z^2)) are vectors of 
dimension V , and c is a real-valued function. This 
family includes the exponential and normal distri- 
bution (with variance known or unknown) as well 
as the multivariate normal, normal linear regression 
and generalized linear models with canonical links. 
The mean of t{Z{2)) given =j is given by the 
y-dimensional vector 
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Table 10 
Estimates of girls' obesity rates 



Obesity rate 



Nuisance parameter 



6»ii 



^12 



021 



e-2 



Observed data 
loglikelihood 



Complete-case estimate 

Restricted MAR 
EM algorithm 

Restricted BCMAR 
EM algoritlim 

Unrestricted BCMAR 
noniterative estimate 



0.723 0.101 0.061 0.115 
(0.014) (0.010) (0.008) (0.010) 



0.685 0.099 0.073 
(0.012) (0.009) (0.009) 



0.143 0.304 
(0.010) (0.010) 



0.683 0.103 0.070 0.143 0.304 
(0.011) (0.009) (0.008) (0.010) (0.010) 



0.690 0.096 0.074 
(0.012) (0.010) (0.009) 



0.140 0.304 
(0.010) (0.010) 



4°) 



0.383 0.518 
(0.006) (0.023) 



0.335 
(0.006) 



0.241 
(0.016) 

0.455 
(0.022) 



0.383 0.518 
(0.006) (0.023) 



0.274 0.121 
(0.034) (0.122) 



-4535.605 



-4569.823 



-4535.292 



In a random sample Zj(2)), ^ = 1, ■ ■ ■ ,n, the ML 

estimate of ipj is V'j = E *(^i(2))-?'(-2'i(i) = 
where nj+ is the number of observations with = 

j; the ML estimate of 9^^^ is 6^^^ = nj^/J2 The 

(2) 

ML estimates of 9j can be obtained from those 
for ijjj. 

We consider as before the missing data structure 
ihustrated in Table 1 with missingness patterns Pr 
with rir observations, for r = 0, . . . ,3. The missing- 
ness parameters <I> = (<^, (j^-p , 4)^^ ^ j = 1, . . . , J) are 



defined as before in (3.1). The parameters in the 
model are denoted by the triple $). 

In this case, the likelihood contributions in each cell 
from the (incomplete) data are as follows: 

• For i G Poi the observed data are Zj^^) , ■^i(2) j 



M, 



Mj(2) = and the likelihood contribution 



is proportional to 

^o(%)=j,^i(2);^^'\^^'\$) 



Table 11 
Estimates of hoys ' obesity rates 







Obesity rate 






Nuisance parameter 


Observed data 
loglikelihood 




011 


012 


021 


022 










Complete-case estimate 


0.710 
(0.015) 


0.099 
(0.010) 


0.073 
(0.008) 


0.118 
(0.010) 












Restricted MAR 
EM algorithm 


0.709 
(0.011) 


0.097 
(0.009) 


0.075 
(0.008) 


0.118 
(0.008) 


0.319 
(0.009) 


0.415 
(0.006) 


0.429 
(0.025) 


0.247 
(0.015) 


-4713.142 


Restricted BCMAR 














4f^=4f 






EM algorithm 


0.709 
(0.011) 


0.098 
(0.009) 


0.075 
(0.008) 


0.118 
(0.008) 


<A 

0.319 
(0.009) 


Vl 

0.360 
(0.005) 


02 

0.375 
(0.023) 


-4748.480 


Unrestricted BCMAR 
noniterative estimate 


0.707 
(0.013) 


0.099 
(0.009) 


0.074 
(0.008) 


0.120 
(0.009) 


0.319 
(0.009) 


91 

0.415 
(0.006) 


0.429 
(0.025) 


0.228 0.325 
(0.037) (0.153) 


-4713.027 
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•(1-4°^ 

For i G Pi, the observed data are Zj(i),Mj(i) = 
0, -Mj(2) = 1, and the likehhood contribution is pro- 
portional to 



For i G P2, the observed data are Zj(2),Afj(i) = 
1, Mj(2) = and the Hkehhood contribution is pro- 
portional to 



= <^^0fexp[c(0f ) + i(Z(2)f0f] 
•(1-4^^). 

For i G P3, no elements of or Z^2) are observed 
and the data comprise Mj(;^) = l,Mj(2) = 1- The 
likelihood contribution is proportional to 



(1)^(1) 



The full observed-data likelihood is then the 
product of such terms and can be written as L = 
LQL1L2L3, where 

J 

Lo = {l- 0)"° Yl{{ef^f^"'>-^+ (1 - 
J 

Li = (1 - YiaefY'^-'H^pfr''''^}, 



and Toj = EiePo *(^«(2))'^(^i(i) = •^')- 

An EM algorithm can readily be applied to max- 
imize the observed-data likelihood. At the E step, 
the underlying complete data in patterns P2 and 



P3 can be replaced with their conditional expec- 
tations, whereas blocks Pq and Pi can be treated 
as complete data. Alternatively, all four patterns 
can be incorporated into the EM approach, with 
the complete data viewed as all the observations 
Zj(i),Zj(2),^ = 1, ... ,71. For the data in block i € P2, 
for example, the expectation step involves calculat- 
ing 

=i)|Zi(2),M,(i) = 1,M,(2) =0] 



eMciof) + t{z,^2)ye 



Tfl(2)i 



After missing data in each pattern are filled in from 
the E step, the M step computes the simple esti- 
mates given above for complete data. 

As in the multinomial case, the block-monotone 
reduced ML estimates of the parameters 6j^\6j^\ 
J = 1, . . . , J, are computed from patterns Pq; -Pi, drop- 
ping the data from the other patterns. The corre- 
sponding block-monotone reduced likelihood of 9^^\ 

Lhm{Po,Pi) ocLo X Li, 

where the factors in the parameters <I> can be ignored 
in Lq,Li. Unlike the multinomial case, these block- 
monotone reduced ML estimates are typically not 

full ML estimates, since there is information about 

(2) 

the parameters 9^ in the excluded patterns. 

8. DISCUSSION 

Most of the work on MNAR mechanisms concerns 
selection or pattern- mixture models, and extensions 
to include latent random effects that are applicable 
to repeated-measures data (Little, 1995). In this ar- 
ticle we consider block-sequential missing data mod- 
els, where the variables in the data set are divided 
into subsets, and the joint distribution of these vari- 
ables and their missing data indicators are factored 
as a sequence. A characteristic of this class is that 
distributions of variables and their missing data in- 
dicators are interleaved, and combinations of selec- 
tion and pattern-mixture models can be developed 
within each block. Except for the work of Robins 
(1997), there appears to be very little existing liter- 
ature on missing data mechanisms of this type. 

Here we consider a class of block-sequential miss- 
ing data which we call block-conditional MAR mod- 
els, in which missingness in successive blocks is al- 
lowed to depend on observed variables in the block 



(2)n 



3(2)1 • 
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and both observed and unobserved data in earlier 
blocks. The proposed class is related to the models 
with 2 blocks described in Little and Zhang (2011), 
in the context of regression with missing data. A 
block-monotone reduced likelihood approach to esti- 
mating these models is described that yields consis- 
tent asymptotically normal estimates without spec- 
ifying the distribution of the missing-data mecha- 
nism. We examined here the BCMAR model in some 
detail for the case of bivariate categorical data, and 
showed that maximization of the block-monotone 
reduced likelihood can yield fully efficient ML esti- 
mates, when associated estimates of parameters of 
the missing-data mechanism lie inside the parame- 
ter space. We also discussed more briefly the case 
where the variable in the second block comes from 
an exponential family, and inference based on the 
block-monotone reduced likelihood approach is not 
in general fully efficient. In future work we plan to 
study other BCMAR models involving more than 
two blocks, continuous and categorical variables and 
missing data within each block, and fully observed 
covariate information. 

The BCMAR model discussed here is related to 
the "latent ignorable" missing data mechanisms pro- 
posed to model missing data in the presence of non- 
compliance with a treatment (Frangakis and Rubin, 
1999; Peng, Little and Raghunathan, 2004). In these 
cases, there is a binary compliance variable that in- 
dicates whether an individual would comply with a 
treatment if assigned to it. In a clinical trial, this in- 
dicator is fully observed for individuals in the active 
treatment group, but is completely missing for indi- 
viduals in the control group, since they do not have 
access to the active treatment. The latent ignorable 
model assumes MAR within subpopulations defined 
by the compliance indicator. Our BCMAR model, 
applied to that setting, generalizes this structure by 
allowing missing data for the stratifying variable. 

The BCMAR model (1.5) is just one of many pos- 
sible block-sequential missing-data models, obtained 
by placing restrictions on the parameters of the dis- 
tributions in each block. Future work might consider 
properties of models obtained by imposing other pa- 
rameter restrictions, based on plausible assumptions 
about the nature of the missing data. 
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